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(54) Itetoreduetaee gene and protein from yeast 

(57) The invention provides a cloned ketoreductase 
gene, vectors for expressing same, recombinant host 
cells that express said vector-borne gene, and a method 



for stereospecifically reducing a ketone using a reconn- 
binant ketoreductase, or a recombinant host cell that ex- 
presses a cloned ketoreductase gene. 
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[0001] This application claims the benefit of U.S. Provisional Application No. 60/064.195, filed November 4, 1997. 
[0002] This invention relates to recombinant DNA technology. In particular the Invention pertains to the cbning of a 
ketoreductase gene from Zygosaccharomyces rouxii, and the use of recombinant hosts expressing fungal ketoreduct- 
ase genes in a process for stereospecific reduction of ketones. 

[0003] 2,3 Benzodiazepine derivatives are potent antagonists of the AMPA (a-amino-3-hydroxy-5 methylisoxazole- 
4-propionic acid) class of receptors in the mammalian central nervous system (See I. Tarnawa etal In Amino Acids: 
Chemistry, Biology and Medicine, Eds. Lubec and Rosenthal, Leiden, 1990). These derivative compounds have po- 
tentially wkjespread applications as neuroprotectrve agents, particularly as anti-convulsants. One series of 2,3 benzo- 
diazepines is considered particularly advantageous for such use, and this series of compounds has the following gen- 
eral formula: 



Wherein R is hydrogen or C^-C^q alkyi; and 

X is hydrogen, C^-C^q alkyI, acyl, aryl, amido or carboxyl, or a substituted derivative thereof. 

[0004] The clinical potential for these compounds has led to interest in devebping more efficient synthetic methods. 
Biologically-based methods in which a ketoreductase enzyme provbes a stereospecific reduction in a whole-cell proc- 
ess using fungal cells have been described in U.S. Patent application serial number 08/413,036. 
[0005] The present invention provides isolated nucleic acid molecules that encode a ketoreductase enzyme from Z. 
rouxii. The Invention also provides the protein product of said nucleic acid, in substantially purified form. Also provided 
are methods for the formation of chiral alcohols using a purified ketoreductase enzyme, or a recombinant host cell that 
expresses a fungal ketoreductase gene. 

[0006] Having the cloned ketoreductase gene enables the production of recombinant ketoreductase protein, and the 
production of recombinant host cells expressing said protein, wherein said recombinant cells can be used in a stere- 
ospecific reduction of ketones. 

[0007] in one embodiment the present invention relates to an isolated DNA molecule encoding ketoreductase protein, 
said DNA molecule comprising the nucleotide sequence identified as SEQ ID NO:1 . 

[0008] I n another embodiment the present invention relates to a substantially purified ketoreductase protein molecule 
from Z rouxii, 

[0009] In another embodiment the present invention relates to a ketoreductase protein molecule from Z rouxii, where- 
in said protein molecule comprises the sequence identified as SEQ ID NO:2. 

[0010] In a further embodiment the present invention relates to a ribonucleic acid molecule encoding ketoreductase 
protein, said ribonucleic acid molecule comprising the sequence identified as SEQ ID NO:3. 
[0011] In yet another embodiment, the present invention relates to a recombinant DNA vector that incorporates a 
ketoreductase gene in operable-linkage to gene expression sequences, enabling sab gene to be transcribed and 
translated in a host cell. 

[0012] In still another embodiment the present invention relates to host cells that have been transformed or trans- 
fected with a cloned ketoreductase gene such that said ketoreductase gene is expressed in the host cell. 
[0013] In a still further embodiment, the present invention relates to a method for producing chiral alcohols using 
recombinant host cells that express an exogenously Introduced ketoreductase gene. 

[0014] In yet another embodiment, the present invention relates to a method for producing chiral alcohols using 
recombinant host cells that have been transfonmed or transfected with a ketoreductase gene from Z rouxii, or S. 
oerevisiae. 
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[0015] In yet another embodiment, the present invention relates to a method for producing chiral alcohols using a 
purified fungal ketoreductase. 

Definitions 

5 

[0016] 

SEQ ID N0:1 - SEQ ID NO:3 comprises the DNA, protein, and RNA sequences of ketoreductase from Z rouxii. 
SEQ ID N0:4- SEQ ID N0:6 comprises the DNA, protein, and RNA sequences of gene YDR541c from S. cerevh 
10 siae. 

SEQ ID NO:7- SEQ ID NO:9 comprises the DNA, protein, and RNA sequences of YOL151 w from S. cerevisiae. 
SEQ ID NO:10- SEQ ID NO: 12 comprises the DNA, protein, and RNA sequences of YGL157wfrom S. cerevisiae. 
SEQ ID N0:1 3- SEQ ID NO: 15 comprises the DNA, protein, and RNA sequences of YGL039w from S. cerevisiae, 

IS [0017] The term "fusion protein" denotes a hybrid protein molecule not found in nature comprising a translational 
fusion or enzymatic fusion in which two or more different proteins or fragments thereof are covalently linked on a single 
polypeptide chain. 

[0018] The term "plasmid" refers to an extrachromosomal genetic element. The starling plasmids herein are either 
commercially available, publicly available on an unrestricted basis, or can be constructed from available plasm kjs in 
20 accordance with published procedures. In addition, equivalent plasmids to those described are known in the art and 
will be apparent to the ordinarily skilled artisan. 

[0019] 'Recombinant DNA cloning vector" as used herein refers to any autononrK)usly replicating agent, including, 
but not limited to, plasmkis and phages, comprising a DNA molecule to which one or more additional DNA segments 
can or have been added. 

2S [0020] The term "recombinant DNA expression vector" or "expression vector" as used herein refers to any recom- 
binant DNA cloning vector, for example a plasmid or phage, in which a promoter and other regulatory elements are 
present thereby enabling transcription of an inserted DNA. 

[0021] The term "vector" as used herein refers to a nucleic acid compound used for introducing exogenous DNA into 
host cells. A vector comprises a nucleotide sequence which may encode one or more protein molecules. Plasmids, 
30 cosmlds, viruses, and bacteriophages, in the natural state or which have undergone recombinant engineering, are 
examples of commonly used vectors. 

[0022] The terms "complementary" or "complementarity" as used herein refers to the capacity of purine and pyrimi- 
dine nucleotides to associate through hydrogen txsnding in double stranded nucleic acid molecules. The following base 
pairs are complementary: guanine and cytosine; adenine and thymine; and adenine and uracil. As used herein "com- 

35 ptementary" means that at least one of two hybridizing strands is fully base-paired with the other member of said 
hybridizing strands, and there are no mismatches. Moreover, at each nucleotide positk)n of said one strand, an "A" is 
paired with a "T", a T" is paired with an "A", a "G" is paired with a "C". and a "C" is paired with a 'G'. 
[0023] "Isolated nucleic acid compound" refers to any RNA or DNA sequence, however constructed or synthesized, 
which is locationally distinct from its natural kx^ation. 

40 [0024] A "primer" is a nucleic acid fragment which functions as an initiating substrate for enzymatic or synthetic 
elongation of, for example, a nucleic acid molecule. 

[0025] The ternn "promoter" refers to a DNA sequence which directs transcription of DNA to RNA. An inducible pro- 
. meter is one that is reguiatable by environmental signals, such as carbon source, heat, metal ions, chemical inducers, 
etc.; a constitutive promoter generally is expressed at a constant level and is not reguiatable. 
45 [0026] A "probe" as used herein is a labeled nucleic acid compound which can hybridize wih another nucleic acid 
compound. 

[0027] The temri "hybrkiization" as used herein refers to a process in which a single-stranded nucleic acid molecule 
joins with a complementary strand through nucleotide base pairing. "Selective hybridization" refers to hybridization 
under conditions of high stringency. The degree of hybridization depends upon, for example, the degree of comple- 
50 mentarlty. the stringency of hybridization, and the length of hybridizing strands. 

[0028] "Substantially identical" means a sequence having sufficient homology to hybridize under stringent conditions 
and/or be at least 90% identical to a sequence disclosed herein. 

[0029] The term "stringency' relates to nucleic acid hybridization conditions. High stringency conditions disfavor non- 
homologous base pairing. Low stringency conditions have the opposite effect. Stringency may be altered, for example. 
55 by changes in temperature, denaturants, and salt concentration. Typical high stringency conditions comprise hybrid- 
izing at 50*C to 65'C in 5X SSPE and 50% formamide, and washing at 50"*C to 65"C in 0.5X SSPE; typical low 
stringency conditions comprise hybridizing at 35°C to 37*" In 5X SSPE and 40% to 45% formamkJe and washing at 
42»Cin1X-2XSSPE. 
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[0030] "SSPE ' denotes a hybridization and wash solution comprising sodium chloride, sodium phosphate, and EDTA, 
at pH 7.4. A 20X solution of SSPE is made by dissolving 174 g of NaCI, 27.6 g of NaH2P04.H20, and 7.4 g of EDTA 
in 800 ml of H2O. The pH is adjusted with NaOH and the volume brought to 1 liter. 

[0031] 'SSC denotes a hybridization and wash solution comprising sodium chloride and sodium citrate at pH 7. A 
20X solution of SSC Is made by dissolving 175 g of NaCi and 88 g of sodium citrate in 800 ml of H2O. The volume is 
brought to 1 liter after adjusting the pH with ION NaOH. 

[0032] The ketoreductase gene encodes a novel enzyme that catalyzes an asymmetric reduction of selected ketone 
substrates {See Equation 1 and Table 1). 
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Table 1: Substrate specificity of ketoreductase from Z. rouxii. 
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[0033] The ketoreductase enzymes disclosed herein are members ot the carbonyl reductase enzyme class. Carbonyl 
reductases are Involved in the reduction of xenobiotic carbonyl compounds (Hara el al, Arch, Biochem. Biophys., 244, 
238-247, 1986) and have been classified Into the short-chain dehydrogenase/reductase (SDR) enzyme superfamily 
(Jornvall et. a/. Biochemistry, 34» 6003-601 3, 1 995) and the single-domain reductase/epimerase/dehydrogenase (RED) 
enzyme superfamily (Labesse et. al, Biochem, J., 304, 95-99, 1 994). The ketoreductases of this inventbn are able to 
effectively reduce a variety of a-ketolactones, a-ketolactams. and diketones (Table 1 ). 

[0034] The ketoreductase gene of Z foux// comprises a DNA sequence designated herein as SEQ ID N0:1 . Those 
skilled in the art will recognize that owing to the degeneracy of the genetic code (i.e. 64 codons which encode 20 amino 
acids), numerous "silent" substitutions of nucleotide base pairs could be introduced into the sequence Identified as 
SEQ ID NO:1 without altering the identity of the encoded amino acki(s) or protein product. AH such substitutions are 
intended to be within the scope of the invention. 



ss Gene Isolation Procedures 



[0035] Those skilled In the art will recognize that the ketoreductase gene may be obtained by a plurality ot applicable 
recombinant DNA techniques including, for example, polymerase chain reaction (PGR) ampllficatbn, hybridization to 
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a genomic or cDNA library, or de novo DNA synthesis. (See e.g., J.Sambrook et al Moiecuiar Cloning . 2d Ed. Chap. 
14(1989)). 

[0036] Methods for constructing cDNA libraries in a suitable vector such as a plasmid or phage for propagation in 
procaryotic or eucaryotic cells are well known to those skilled in the art. [See e.g. J.Sambrook et al. Supra]. Suitable 

5 cloning vectors are widely available. 

[0037] Skilled artisans will recognize that the ketoreductase gene or fragment thereof could be isolated by PCR 
amplification from a human cDNA library prepared from a tissue in which said gene is expressed, using oligonucleotide 
primers targeted to any suitable region of SEQ ID N0:1. Methods for PCR amplification are widely known in the art. 
See eg. PCR Protocols: A Guide to Method and Application . Ed. M. Innis etal., Academic Press (1990). The ampllfi- 

10 cation reactbn comprises template DNA, suitable enzymes, primers, nucleoside triphosphates, and buffers, and is 
conveniently carried out in a DNA Thermal Cycler (Perkin Elmer Cetus, Non/yalk, CT). A positive result is determined 
by detecting an appropriately-sized DNA fragment following gel electrophoresis. 

Protein Production Methods 

IS 

[0038] One embodiment of the present invention relates to the substantially purified ketoreductase enzyme (identified 
herein as SEQ ID NO:2) encoded by the Z ketoreductase gene (identified herein as SEQ ID NO:1 ). 
[0039] Skilled artisans will recognize that the proteins of the present invention can be synthesized by a number of 
different methods, such as chemical methods well known in the art, including solid phase peptide synthesis or recom- 
20 binant methods. Both methods are described in U.S. Patent 4.617,1 49, incorporated herein by reference. The proteins 
of the Invention can also be purified by well known methods from a culture of cells that produce the protein, for example, 
Z rou)di, 

[0040] The principles of solid phase chemical synthesis of polypeptides are well known in the art and may be found 
in general texts in the area. See, e.g., H. Dugas and C. Penney, Bioorganic Chemistry (1981) Springer- Verlag, New 

2S York, 54-92. For example, peptides may be synthesized by solid-phase methodology utilizing an Applied Blosystems 
430A peptide synthesizer (Applied Biosystems, Foster City, CA) and synthesis cycles supplied by Applied Biosystems. 
[0041] The protein of the present inventbn can also be produced by recombinant DNA methods using the cloned 
ketoreductase gene. Recombinant methods are preferred if a high yield is desired. Expression of the cloned gene can 
be carried out in a variety of suitable host cells, well known to those skilled In the art. For this purpose, the ketoreductase 

30 gene Is Introduced into a host cell by any suitable means, well known to those skilled In the art. While chromosomal 
integration of the cloned gene is within the scope of the present invention, it is preferred that the gene be cloned into 
a suitable extra-chromosomal ly maintained expression vector so that the coding region of the ketoreductase gene is 
ope rably-l Inked to a constitutive or Inducible promoter. 

[0042] The basic steps in the recombinant production of the ketoreductase protein are: 

35 

a) constructing a natural, synthetic or semi-synthetic DNA encoding ketoreductase protein; 

b) integrating said DNA Into an expression vector In a manner suitable for expressing the ketoreductase protein, 
either alone or as a fusion protein; or integrating said DNA into a host chromosome such that said DNA expresses 

40 ketoreductase; 

c) transforming or othenwise introducing said vector into an appropriate eucaryotic or prokaryotic host cell forming 
a recombinant host cell, 

45 d) culturing said recombinant host cell in a manner to express the ketoreductase protein; and 

e) recovering and substantially purifying the ketoreductase protein by any suitable means, well known to those 
skilled in the art. 

^0 Expressing Recombinant ketoreductase Protein in Procan/otic and Eucaryotic Host Cells 

[0043] Procaryotes may be employed in the production of the ketoreductase protein. For example, the Escherichia 
CO// K1 2 strain 294 (ATCC No. 31446) or strain RV308 is particularly useful for the prokaryotic expression of foreign 
proteins. Other strains of E. coti, bacilli such as Bacillus subtilis, enterobacteriaceae such as Salmonella typhimurium 
ss or Serratia marcescans, various Pseudomonas species and other bacteria, such as Streptomyces, may also be em- 
ployed as host ceils in the cloning and expression of the recombinant proteins of this invention. 
[0044] Promoter sequences suitable for driving the expression of genes in procaryotes include ^-lactamase [e.g. 
vector pGX2907, ATCC 39344, contains a replicon and p -lactamase gene], lactose systems [Chang et al.. Nature 
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(London). 275:61 5 (1 978); Goeckiet et al., Nature (London). 281 :544 (1 979)], alkaline phosphatase, and the tryptophan 
(trp) pronnoter system [vector pATHI (ATCC 37695) which Is designed tofacllltale expression o1 an open reading frame 
as a trpE fusbn protein under the coritrol of the trp promoter]. Hybrid promoters such as the tac promoter (isolatable 
from plasmid pDR540, ATCC-37282) are also suitable. Still other bacterial promoters, whose nucleotide sequences 
s are generally known, enable one of skill in the art to iigate such promoter sequences to DNA encoding the proteins of 
the instant invention using linkers or adapters to supply any required restrictbn sites. Promoters for use in bacterial 
systems also will contain a Shine-Dalgarno sequence operably linked to the DNA encoding the desired polypeptides. 
These examples are illustrative rather than limiting. 

[0045] The protein(s) of this invention may be synthesized either by direct expression or as a fusion protein comprising 

^0 the protein of interest as a translational fusion with another protein or peptide which may be removable by enzymatic 
or chemical cleavage. It is often obsen/ed in the production of certain peptides in recombinant systems that expression 
as a fusion protein prolongs the lifespan, Increases the yield of the desired peptide, or provides a convenient means 
of purifying the protein. A variety of peptidases (e.g. enterokinase and thrombin) which cleave a polypeptide at specific 
sites or digest the peptides from the amino or carfooxy termini (e.g. diamlnopeptidase) of the peptide chain are known. 

'5 Furthermore, particular chemicals (e.g. cyanogen bromide) will cleave a polypeptide chain at specific sites. The skilled 
artisan will appreciate the modifications necessary to the amino acid sequence (and synthetic or semi-synthetic coding 
sequence if recombinant means are employed) to incorporate site-specific internal cleavage sites. See e.g., R Carter, 
"Site Specific Proteolysis of Fusion Proteins'. Chapter 13. in Protein Purification: From Molecular Mechanisms to Large 
Scale Processes . American Chemical Society, Washington, D.C. (1990). 

^ • [0046] In addition to procaryotes, a variety of eucaryotic microorganisms including yeast are suitable host cells. The 
yeast Saccharomyces cerevisiae is the most commonly used eucaryotic microorganism. Other yeasts such as Kluy- 
veromyoes lactis, Schizosaccharomyces pombe, and Pichia pastoris are also suitable. For expression in Saccharo- 
myces, the plasmid YRp7 (ATCC-40053), for example, may be used. See, e.g., L. Stinchcomb, etal., Nature, 282:39 
(1 979); J. Kingsman et ai. Gene, 7: 1 41 (1 979); S. Tschemper etal.. Gene. 1 0: 1 57 (1 980). Plasmid YRp7 contains the 

2S TRP1 gene which provkjes a selectable marker for use in a trpi auxotrophic mutant. 

Purification of Recombinantlv-Produced ketoreductase Protein 

[0047] An expression vector carrying a cloned ketoreductase gene is transformed or transfected Into a suitable host 

30 cell using standard methods. Host cells may comprise procaryotes, such as E. coli, or simple eucaryotes, such as Z. 
rouxii, S, cerevisiae, S. pombe, P. pastoris, and K. I^ctis, Cells which contain the vector are propagated under condi- 
tions suitable for expressbn of an encoded ketoreductase protein. If the recombinant gene has been placed under the 
control of an inducible prorrx)ter then suitable growth conditions would incorporate the appropriate inducer The re? 
combinantly-produced protein may be purified from cellular extracts of transformed cells by any suitable means. 

3S [0048] In a preferred process for protein purification, the ketoreductase gene is modified at the 5' end to incorporate 
several histidine residues at the amino terminus of the ketoreductase protein product. This 'histidine tag' enables a 
single-step protein purification method referred to as 'immobilized metal ion affinity chromatography" (IMAC). essen- 
tially as described in U.S. Patent 4,569,794 which hereby is incorporated by reference. The IMAC method enables 
rapid isolation of substantially pure ketoreductase protein starting from a crude cellular extract. 

40 [0049] Other embodiments of the present invention comprise isolated nucleic acid sequences which encode SEQ 
ID NO:2. As skilled artisans will recognize, the amino acid compounds of the Invention can be encoded by a multitude 
of different nucleic ackJ sequences because most of the amino acids are encoded by more than one codon. Because 
these alternative nucleic acid sequences would encode the same amino acid sequences, the present invention further 
comprises these alternate nucleic acid sequences. 

45 [0050] The ketoreductase genes discbsed herein, for example SEQ ID NO:1. may be produced using synthetic 
methodotagy. The synthesis of nucleic acids is well known in the art. See, e.g., E.L. Brown, R. Belagaje, M.J. Ryan, 
and H.G. Khorana. Methods in Enzymology . 68:109-151 (1979). A DNA segment corresponding to a ketoreductase 
gene could be generated using a conventional DNA synthesizing apparatus, such as the Applied Biosystems Model 
3e0A or 380B DNA synthesizers (Applied Biosystems. Inc.. 850 Lincoln Center Drive. Foster City, CA 94404) which 

so employ phosphoramidite chemistry. Altematively, phosphotriester chemistry may be employed to synthesize the nucleic 
acids of this invention. [See, e.g., M.J. Gait, ed., Olioonucleotide Synthesis, A Practical Approach . (1984).] 
[0051] In an alternative methodology, namely PCR, a DNA sequence comprising a portbn or all of SEQ ID N0:1. 
SEQ ID N0:4. SEQ ID NO:7, SEQ ID NO:10, or SEQ ID N0:13 can be generated from a suitable DNA source, for 
example Z. rouxiiot S. cerevisiae genomic DNA or cDNA. For this purpose, suitable oligonucleotide primers targeting 

ss SEQ ID NO:1 , SEQ ID NO:4, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:1 3 or region therein are prepared, as described 
in U.S. Patent No. 4,889,818, which hereby is incorporated by reference. Protocols for performing the PCR are dis- 
closed In, for example. PCR Protocols: A Guide to Method and Applications . Ed. Michael A. Innis etaL, Academic 
Press, Inc. (1990). 
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[0052] The ribonucieic acids of the present invention may be prepared using the polynucleotide synthetic methods 
discussed supra, or they may be prepared enzymatically using RNA polymerase to transcribe a ketoreductase DNA 
template. See e.g., J. Sambrool<, et. al, supra, at 18.82-18.84. 

[0053] This invention also provides nucleic acids, RNA or DNA, which are complementary to SEQ ID NO:1 , SEQ ID 
5 NO:3, SEQ ID NO:4. SEQ ID NO:6, SEQ ID NO:7. SEQ ID NO:9. SEQ ID NOilG. SEQ ID NO:12. SEQ ID NO:13. or 
SEQ ID NO: 15. 

[0054] The present invention also provides probes and primers useful for a variety of molecular biology techniques 
including, for example, hybridization screens of genomic, subgenomic. or cDNA libraries. A nucleic acid compound 
comprising SEQ ID NO:1. SEQ ID NO:3, SEQ ID N0:4. SEQ ID N0:6. SEQ ID N0:7. SEQ ID NO:9. SEQ ID NO:10. 
fo SEQ ID NO: 12, SEQ ID NO: 1 3, or SEQ ID NO: 15, or a complementary sequence thereof, or a fragment thereof, which 
is at least 18 base pairs in length, and which will selectively hybridize to DNA encoding a ketoreductase, is provided. 
Preferably, the 18 or more base pair compound is DNA. See e.g. B. Wallace and G. Miyada, 'Oligonucleotide Probes 
for the Screening of Recombinant DNA Libraries," In Methods in Enzymoloqv . Vol. 152, 432-442, Academic Press 
(1987). 

'5 [0055] Probes and primers can be prepared by enzymatic methods well known to those skilled In the art [See e.g. 
Sambrook etal supra). In a most preferred embodiment these probes and primers are synthesized using chemical 
means as described above. 

[0056] Another aspect of the present invention relates to recombinant DNA cloning vectors and expression vectors 
comprising the nucleic acids of the present invention. The preferred nucleic acid vectors are those which comprise 
20 DNA. The most preferred recombinant DNA vectors comprise a isolated DNA sequence selected from the group con- 
sisting of SEQ ID NO:1, SEQ ID NO:4, SEQ ID NO:7, SEQ ID NO:10. or SEQ ID NO:13. . 
[0057] The skilled artisan understands that choosing the most appropriate cloning vector or expression vector de- 
pends upon a number of factors including the availability of restriction enzyme sites, the type of host cell into which 
the vector is to be transfected or transformed, the purpose of the transfection or transformation (e.g., stable transfor- 
ms mation as an extrachromosomal element, or integration into the host chromosome), the presence or absence of readily 
assayable or selectable markers (e.g., antibiotic resistance and metabolic markers of one type and another), and the 
number of copies of the gene to be present in the host cell. 

[0058] Vectors suitable to carry the nucleic acids of the present invention comprise RNA viruses, DNA viruses, lytic 
bacteriophages, lysogenic bacteriophages, stable bacteriophages, plasmids, virolds, and the like. The most preferred 
30 vectors are plasmids. 

[0059] When preparing an expression vector the skilled artisan understands that there are many variables to be 
considered, for example, whether to use a constitutive or inducible promoter. Inducible promoters are preferred because 
they enable high level, regulatable expression of an operably-l inked gene. Constitutive promoters are further suitable 
in instances for which secretin or extra-cellular export is desireable. The skilled artisan will recognize a number of 

3S inducible promoters which respond to a variety of inducers, for example. cari3on source, metal ions, and heat. The 
practitioner also understands that the amount of nucleic acid or protein to be produced dictates, in part, the selection 
of the expression system. The additbn of certain nucleotide sequences is useful for directing the localization of a 
recombinant protein. For example, a sequence encoding a signal peptkte preceding the coding regbn of a gene, is 
useful for directing the extra-cellular export of a resulting polypeptide. 

40 [0060] Host cells harboring the nucleic ackils disclosed herein are also provided by the present invention. Suitable 
host cells include procaryotes, such as E. colt, or simple eucaryotes, such as fungal cells, which have been transfected 
or transformed with a vector which comprises a nucleic acid of the present invention. 

[0061] The present invention also provides a method for constructing a recombinant host cell capable of expressing 
SEQ ID NO:2, SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:11 . or SEQ ID N0:14, said method comprising transfomding 

4S or otherwise introducing into a host cell a recombinant DNA vector that comprises an isolated DNA sequence which 
encodes SEQ ID NO:2. SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:11 , or SEQ ID NO: 14. Preferred vectors for expression 
are those which comprise SEQ ID NO:1 . Transformed host cells may be cultured under conditions well known to skilled 
artisans such that SEQ ID NO:2. SEQ ID NO:5, SEQ ID NO:8. SEQ ID NO:11 , or SEQ ID NO:14 is expressed, thereby 
producing a ketoreductase protein in the recombinant host cell. 

BO [0062] For the purpose of identifying or developing inhibitors or other modifiers of the enzymes disclosed herein, or 
for identifying suitable substrates for bioconversion, it woukJ be desirable to identify compounds that bind and/or inhibit, 
or otherwise modify, the ketoreductase enzyme and its associated activity. A method for determining agents that will 
modify the ketoreductase activity comprises contacting the ketoreductase protein with a test compound and monitoring 
the alteration of enzyme activity by any suitable means. 

BS [0063] The instant invention provides such a screening system useful for discovering compounds which bind the 
ketoreductase protein, said screening system comprising the steps of: 

a) preparing ketoreductase protein; 
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b) exposing said ketoreductase protein to a test compound; 

c) quantifying a modulation of activity by said compound. 

5 [0064] Utilization of the screening system described above provides a means to determine compounds wliich may 
alter the activity of ketoreductase. This screening method may be adapted to automated procedures such as a PAN- 
DEX® (Baxter-Dade Diagnostics) system, allowing for efficient high-volume screening of potential modifying agents. 
[0065] In such a screening protocol, ketoreductase is prepared as described herein, preferably using recombinant 
DNA technology. A test compound is introduced into a reaction vessel containing ketoreductase, folbwed by addition 

10 of enzyme substrate. For convenience the reaction can be coupled to the oxidation of NADPH, thereby enabling 
progress to be monitored spectrophotometrically by measuring the absorbance at 340 nm. Alternatively, substrate may 
be added simultaneously with a test compound. In one method radioactively or chemically-labeled compound may be 
used. The products of the enzymatic reaction are assayed for the chemical label or radioactivity by any suitable means. 
The absence or diminution of the chemical label or radbactivity indicates the degree to which the reaction is inhibited. 

IS [0066] The following examples more fully describe the present Inventton. Those skilled in the art will recognize that 
the particular reagents, equipment, and procedures described are merely illustrative and are not intended to limit the 
present invention in any manner. 

EXAMPLE 1 

20 

Construction of a DNA Vector for Expressina a Ketoreductase Gene in a Homologous or Heterologous Host 

[0067] A plasmid comprising the Z rouxii ketoreductase gene suitable for expressing said gene in a host cell, for 
example E. co// (DE3) strains, contains an origin of replication (Ori), an ampicillln resistance gene (Amp), useful for 

2S selecting cells which have incorporated the vector following a tranformation procedure, and further comprises the lad 
gene for repression of the lac operon, as well as the T7 promoter and 17 temninator sequences in operable linkage to 
the coding regbn of the ketoreductase gene. Parent plasmid pETIIA (obtained from Novogen, Madison, Wl) was 
linearized by digestbn with endon ucleases A/dal and BamHl. Linearized pET1 1 A was ligated to a DNA fragment bearing 
A/del and BamHl sticky ends and further comprising the coding region of the Zrouxii ketoreductase gene. 

30 [0068] The ketoreductase gene is isolated most conveniently by the PGR. Genomic DNA from Z rouxii isolated by 
standard methods was used for amplification of the ketoreductase gene. Primers are synthesized corresponding to 
the 5' and 3' ends of the gene (SEQ ID NO:1) to enable amplification of the coding region. 

[0069] The ketoreductase gene (nucleotides 164 through 11 77 of SEQ ID NO: 1) ligated into the vector was modified 
at the 5' end (amino terminus of encoded protein) in order to simplify purification of the encoded ketoreductase protein. 
3S For this purpose, an oligonucleotide encoding 8 histidine residues and a factor Xa cleavage site was inserted after the 
ATG start codon at nucleotide positions 164 to 166 of SEQ ID NO:1 . Placement of the histidine residues at the amino 
terminus of the encoded protein does not affect its activity and serves only to enable the IMAC one^tep protein puri- 
fication procedure. 

40 EXAMPLE 2 

Purification of Ketoreductase from Z rouxii 

[0070] Approximately 1 gram of Z rouxiiceW paste was resuspended in Lysing Buffer, comprising 50 mM Tris-CI pH 
4S 7.5, 2 mM EDTA supplemented with pepstatin (1 \ig/mL), leupeptin (1.25 \igfmL), aprotinin (2.5 ^g/mL), and AEBSF 
(25 ^ig/mL). The cells were lysed using a DynoMill (GlenMills, Inc. Clifton, NJ) equipped with 0.5-0.75 mm lead free 
beads under continuous flow conditions according to the manufacturer's recommended use. After four complete passes 
through the DynoMill. the material was centrifuged twice (25,000 x grfor 30 minutes at 4*^0). Solid ammonium sulfate 
(291 g/liter) was added slowly to the resulting clarified cell extract with stirring at 4^C to achieve 50% saturation. After- 
s'* 1 hour, the mixture was centrifuged at 23,000 x gfor 30 minutes. The supematant was then brought to 65% saturation 
by the addition of solid ammonium sulfate (159 g/liter) and stirred for 1h at 4'C before centrifugation (23,000 xgior 30 
min). The resultant 50-85% ammonium sulfate pellet was resuspended in 600 mL of Lysing Buffer and the residual 
ammonium sulfate was removed by dialysis against the same buffer at 4<'C. The desalted material was centrifuged 
twice to remove particulate matter (23,000 xgloT 30 min) and 700 - 800 Units of the clarified material was loaded onto 
ss a Red-120 dye affinity column (32 mm X 140 mm) equilibrated In 50 mM Tris-CI pH 7.5, 1 mM MgCl2, pepstatin (1 jag/ 
mL), leupeptin (1 .25 ^ig/mL), and aprotinin (2.5 p^g/mL). Reductase activity was eluied from the column at a ftowrate 
of 8 mL/min under the following conditions: 1) a 10 minute linear gradient from 0 - 0.3 M NaCI; 2) 13 minutes at 0.3 M 
NaCI; 3) a 60 minute linear gradient from 0.3 - 1.5 M NaCI. The fractions containing reductase activity were pooled, 
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' and changed to 20 mM potassium phosphate buffer (pH 7.2), pepstatin (1 ^g/mL). leupeptin (1 .25 ^g/mL). and aprotinin 
(2.5 ^ig/mL) by dialysis at 4*C. The sample was clarified by centrifugation (23,000 x g for 30 min) and 400 Units was 
loaded onto a Bio-Scale CHT-I hydroxyapatite column (15 mm x 113 mm, Bio-Rad, Inc.) equilibrated in the same buffer 
that had been made 5% in glycerol. Reductase activity was e luted from the column at a flowrate of 5.0 mUhiin in a 

s sodium chloride step gradient consisting of 5 minutes at 0 M NaCI, a gradient step to 0.7 M NaCI which was maintained 
for 10 minutes, and then a 20 minute linear gradient from 0.7 - 1 .0 M NaCI. The fractions containing reductase activity 
were pooled and desalted with 20 mM potassium phosphate buffer (pH 7.2), pepstatin A (1 ^g/mL), leupeptin (1.25 
|xg/mL), and aprotinin (2.5 ^xg/mL) by dialysis at 4*0. The sample (100- 200 Units) was loaded onto a Bio-Scale CHT- 
I hydroxyapatite column (10 mm x 64 mm) equilibrated in the same buffer which had been made 5% in glycerol. Re- 

10 ductase activity was eluted from the column at a flowrate of 2.0 miymin in a 25 minute linear gradient from 0 to 50% 
400 mM potassium phosphate (pH 6.8), 5% glycerol. Fractions containing reductase activity were pooled and changed 
into 10 mM Tris-CI (pH 8.5) by dialysis at 4°C. The sample vias then made 10% in glycerol, concentrated to 0.4 mg/ 
mL by ultrafiltration (Amicon, YM-10), and stored at-70"C. 

IS EXAMPLE 3 

Reductase Activity Using the Ketoreductase from Z rouxii 

[0071] Reductase activity was measured using a suitable substrate and a partially purified or substantially purified 
20 ketoreductase from Z rouxiL Activity was measured as a function of the absorbance change at 340 nm, resulting from 
the oxidation of NADPH. The 1 ml assay contained a mixture of 3.0 mM 3,4-methylenedioxyphenyl acetone. 162 \M 
NADPH, 50 mM MOPS buffer (pH 6.8), and 0.6 mU of ketoreductase and was carried out at 26'' C. Reaction mixtures 
were first equilibrated at 26*^0 for 10 min in the absence of NADPH, and then initiated by addition of NADPH. The 
absorbance was measured at 340 nm every 15 seconds over a 5 minute period; the change in absorbance was found 
25 to be linear over that time period. The kinetic parameters tor 3,4-methylenedioxyphenyl acetone were determined at 
an NADPH concentration of 112 )iM and a 3,4-methylenedioxyphenyl acetone concentration that varied from 1 .7 mM 
- 7.2 mM. The kinetic parameters for NADPH were determined by maintaining the 3,4-methylenedioxyphenyl acetone 
concentration at 3 mM and the NADPH concentration was varied from 20.5 ^M - 236.0 ^M. An extinction coefficient 
of 6220 M"' cm*"" for NADPH absorbance at 340 nm was used to calculate the specific activity of the enzyme. For 
30 assays using isatin, the change in absorbance with time was measured at 414 nm using an extinction coefficient of 
849 M'^ cm'i to calculate activity. One Unit of activity corresponds to 1 ^mol of NADPH consumed per minute. For 
assays carried out at differing pH values, 10 mM Bis-Trls and 10 mM Tris were adjusted to the appropriate pH with 
HCI. Kinetic parameters were determined by non-linear regression using the JMP® statistics and graphics program. 

35 EXAMPLE 4 

Whole Cell Method for Stereoselective Reduction of Ketone Using Recombinant Yeast Cell 

[0072] A vector for expressing the cloned Z rouxii ketoreductase gene (SEQ I D NO: 1 ) in a procaryotic or fungal cell, 

^0 8uch as S. cerevmlae, Is constructed as follows. A 1 01 4 base pair fragment of Zroux// genomic DN A or cDN A, carrying 
the ketoreductase gene, is amplified by PCR using primers targeted to the ends of the coding region specified In SEQ 
ID NO:1. It is desireable that the primers also incorporate suitable cloning sites for cloning of said 1014 base pair 
fragment into an expression vector. The appropriate fragment encoding ketoreductase is amplified and purified using 
standard methods, for cloning into an expression vector. 

4S [0073] A suitable vector for expression In E co//and S, cerevislaQ is pYX213 (available from Novagen, Inc., 597 
Science Drive, Madison, Wl 53711; Code MBV-029-10), a 7.5 Kb plasmid that carries the following genetic markers: 
ori, 2|i circle, Amp", CEN, URA3. and the GAL promoter, for high level expression in yeast. Downstream of the GAL 
promoter. pYX21 3 carries a multiple cloning site (MCS), which will accommodate the ketoreductase gene amplified in 
the preceding step. A recombinant plasmid is created by digesting pYX21 3 and the.amplified ketoreductase gene with 

so a restriction enzyme, such as BamHI, and ligating the fragments together. 

[0074] A recombinant expression vector carrying the Z. rouxii ketoreductase gene is transformed into a suitable Ura" 
strain of S. cerevisiae, using well known methods. Ura-*- transformants are selected on minimal medium lacking uracil. 
[0075] Expression of the recombinant ketoreductase gene may be induced if desired by growing transformants in 
minimal medium that contains 2% galactose as the sole carbon source. 

S5 [0076] To carry out a whole cell stereospecific reduction, 3,4-methylenedioxyphenyl acetone is added to a culture of 
transformants to a concentration of about 10 grams per liter of culture. The culture is Incubated with shaking at room 
temperature for 24 hours, and the presence of the chiral alcohol analyzed by HPLC. 
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Annex to the description 
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SEOUENCE LISTING 



(1) GENERAL INFORMATION: 



10 (i) APPLICANT: ELI LILLY AND COMPANY 

(B) STREET: Lilly Corporate Center 

(C) CITY: Indianapolis 

(D) STATE: Indiauia 

(E) COUNTRY: United States o£ America 
(P) ZIP: 4628S 

15 

(ii) TITLE OF INVENTION: Ketoreductase Gene and Protein From Yeast 

(iii) NUMBER OF SEQUENCES: 15 

(iv) CORRESPONDENCE ADDRESS: 

<A) ADDRESSEE: A. M. Denholm 

(B) STREET: Erl Wood Manor 

(C) CITY: WindleshaiTi 

(D) STATE: Surrey 

(E) COUNTRY: United Kingdom 

(F) ZIP: GU20 6PH 

2S (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
<B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 



(2) INFORMATION FOR SEQ ID N0:1: 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1270 base pairs 
« . (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(ix) FEATXJRE: 

(A) NAME/KEY: CDS 
45 {B) LOCATION: 164.. 1177 

(D) OTHER INFORMATION: Z.rouxii )cetoreductase 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0;1; 

TGAATGGTTA TTTTAGCAAT TGCTGTGTGA GGCACTGACC TAAAGATGTG TATAAATAGT 60 

GGGACTGTGT ACTCATGAGG ATCAATACAT GTATAAACTT ACCATACTTT CACACAAGTC 120 

AACTTAGAAT CAATCAATCA ATCAATTAAT CAAGCTATAC AAT ATG ACA AAA GTC 175 

Met Thr Lys Val 
1 

TTC GTA ACA GGT GCC AAC GGA TTC GTT GCT CAA CAC GTC GTT CAT CAA 223 
Phe Val Thr Gly Ala Asn Gly Phe Val Ala Gin His Val Val His Gin 
5 10 15 20 
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CTA TTA GAA AAG AAC TAT ACA GTG GTT GGA TCT GTC CGT TCA ACT GAG 271 
Leu Leu Glu Lys Asn Tyr Thr Val Val Gly Ser Val Arg Ser Thr Glu 
25 30 35 

5 AAA GGT GAT AAA TTA GCT AAA TTG CTA AAC AAT CCA AAA TTT TCA TAT 319 

Lys Gly Asp Lys Leu Ala Lys Leu Leu Asn Asn Pro Lys Phe Ser Tyr 
40 45 50 

GAG ATT ATT AAA GAT ATG GTC AAT TCG AGA GAT GAA TTC GAT AAG GCT 367 
Glu lie lie Lys Asp Met Val Asn Ser Arg Asp Glu Phe Asp Lys Ala 
10 55 60 65 

TTA CAA AAA CAT TCA GAT GTT GAA ATT GTC TTA CAT ACT GCT TCA CCA 415 
Leu Gin Lys His Ser Asp Val Glu He Val Leu His Thr Ala Ser Pro 
70 75 80 

IS GTC TTC CCA GGT GGT ATT AAA GAT GTT GAA AAA GAA ATG ATC CAA CCA 463 

Val Phe Pro Gly Gly lie Lys Asp Val Glu Lys Glu Met He Gin Pro 
85 90 95 100 

GCT GTT AAT GGT ACT AGA AAT GTC TTG TTA TCA ATC AAG GAT AAC TTA 511 
Ala Val Asn Gly Thr Arg Asn Val Leu Leu Ser He Lys Asp Asn Leu 
20 105 110 115 

CCA AAT GTC AAG AGA TTT GTT TAG ACT TCT TCA TTA GCT GCT GTC CGT 559 
Pro Asn Val Lys Arg Phe Val Tyr Thr Ser Ser Leu Ala Ala Val Arg 
120 125 130 

2S ACT GAA GGT GCT GGT TAT AGT GCA GAG GAA GTT GTC ACC GAA GAT TCT 607 

Thr Glu Gly Ala Gly Tyr Ser Ala Asp Glu Val Val Thr Glu Asp Ser 
135 140 145 

TGG AAC AAT ATT GCA TTG AAA GAT GCC ACC AAG GAT GAA GGT ACA GCT 655 
Trp Asn Asn He Ala Leu Lys Asp Ala Thr Lys Asp Glu Gly Thr Ala 
30 150 155 160 

TAT GAG GCT TCC AAG ACA TAT GGT GAA AAA GAA GTT TGG AAT TTC TTC 703 
TVr Glu Ala Ser Lys Thr Tyr Gly Glu Lys Glu Val Trp Asn Phe Phe 
165 170 175 180 

3S GAA AAA ACT AAA AAT GTT AAT TTC GAT TTT GCC ATC ATC AAC CCA GTT 751 

Glu Lys Thr Lys Asn Val Asn Phe Asp Phe Ala He He Asn Pro Val 
185 190 195 

TAT GTC TTT GGT CCT CAA TTA TTT GAA GAA TAC GTT ACT GAT AAA TTG 799 
Tyr Val Phe Gly Pro Gin Leu Phe Glu Glu Tyr Val Thr Asp Lys Leu 
40 200 205 210 

AAC TTT TCC AGT GAA ATC ATT AAT AGT ATA ATA AAA GGT GAA AAG AAG 847 
Asn Phe Ser Ser Glu He He Asn Ser He He Lys Gly Glu Lys Lys 
215 220 225 

45 GAA ATT GAA GGT TAT GAA ATT GAT GTT AGA GAT ATT GCA AGA GCT CAT 895 

Glu He Glu Gly Tyr Glu He Asp' Val Arg Asp He Ala Arg Ala His 
230 235 240 

ATC TCT GCT GTT GAA AAT CCA GCA ACT ACA CGT CAA AGA TTA ATT CCA 943 
He Ser Ala Val Glu Asn Pro Ala Thr Thr Arg Gin Arg Leu He Pro 
so 245 250 255 260 

GCA GTT GCA CCA TAC AAT CAA CAA ACT ATC TTG GAT GTT TTG AAT GAA 991 
Ala Val Ala Pro Tyr Asn Gin Gin Thr He Leu Asp Val Leu Asn Glu 
265 270 275 

55 AAC TTC CCA GAA TTG AAA GGT AAA ATC GAT GTT GGG AAA CCA GGT TCT 1039 

Asn Phe Pro Glu Leu Lys Gly Lys He Asp Val Gly Lys Pro Gly Ser 
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280 285 290 

CAA AAT GAA TTT ATT AAA AAA TAT TAT AAA TTA GAT AAC TCA AAG ACC 1087 
Gin Asn Glu Phe lie Lys Lys Tyr Tyr Lys Leu Asp Asn Ser Lys Thr 
5 295 300 305 

AAA AAA GTT TTA GGT TTT GAA TTC ATT TCC CAA GAG CAA ACA ATC AAA 1135 
Lys Lye Val Leu Gly Phe Glu Phe He Ser Gin Glu Gin Thr He Lys 
310 315 320 

10 GAT GCT GCT GCT CAA ATC TTG TCC GTT AAA AAT GGA AAA AAA 1177 

Asp Ala Ala Ala Gin He Leu Ser Val Lys Asn Gly Lys Lys 
325 330 335 

TAAQTGAACT AGACCTQTCA CTATCAGATT ATTAGAGTTC TGTATAGATT AAAGTCTGAA 1237 
IS AATGTATTAO AATCATAATT TTATAATATG CCT 1270 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 338 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Thr Lys Val Phe Val Thr Gly Ala Asn Gly Phe Val Ala Gin His 
15 10 15 

Val Val His Gin Leu Leu Glu Lys Asn Tyr Thr Val Val Gly Ser Val 
30 20 25 30 

Arg Ser Thr Glu Lys Gly Asp Lys Leu Ala Lys Leu Leu Asn Asn Pro 
35 40 45 

Lys Phe Ser Tyr Glu He He Lys Asp Met Val Asn Ser Arg Asp Glu 
3S 50 55 60 

Phe Asp Lys Ala Leu Gin Lys His Ser Asp. Val Glu He Val Leu His 
65 70 75 80 

Thr Ala Ser Pro Val Phe Pro Gly Gly He Lys Asp Val Glu Lys Glu 
40 85 90 95 

Met He Gin Pro Ala Val Asn Gly Thr Arg Asn Val Leu Leu Ser He 

100 105 110 

Lys Asp Asn Leu Pro Asn Val Lys Arg Phe Val Tyr Thr Ser Ser Leu 
45 115 120 125 

Ala Ala Val Arg Thr Glu Gly Ala Gly Tyr Ser Ala Asp Glu Val Val 
130 135 140 

Thr Glu Asp Ser Trp Asn Asn He Ala Leu Lys Asp Ala Thr Lys Asp 
50 145 150 155 160 

Glu Gly Thr Ala Tyr Glu Ala Ser Lys Thr Tyr Gly Glu Lys Glu Val 
165 170 175 



55 



Trp Asn Phe Phe Glu Lys Thr Lys Asn Val Asn Phe Asp Phe Ala He 
180 185 190 
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lie Asn Pro Val Tyr Val Phe Gly Pro Gin Leu Phe Glu Glu Tyr Val 
195 200 205 

Thr Asp Lys Leu Asn Phe Ser Ser Glu lie He Asn Ser He He Lys 
5 210 215 220 

Gly Glu Lys Lys Glu He Glu Gly Tyr Glu He Asp Val Arg Asp He 
225 230 235 240 

Ala Arg Ala His He Ser Ala Val Glu Asn Pro Ala Thr Thr Arg Gin 
10 245 250 255 

Arg Leu He Pro Ala Val Ala Pro Tyr Asn Gin Gin Thr He Leu Asp 
260 265 270 

Val Leu Asn Glu Asn Phe Pro Glu Leu Lys Gly Lys He Asp Val Gly 
IS 275 280 285 

Lys Pro Gly Ser Gin Asn Glu Phe He Lys Lys Tyr Tyr Lys Leu Asp 
290 295 300 

Asn Ser Lys Thr Lys Lys Val Leu Gly Phe Glu Phe He Ser Gin Glu 
20 305 310 315 320 

Gin Thr He Lys Asp Ala Ala Ala Gin He Leu Ser Val Lys Asn Gly 
325 330 335 



2S 



Lys Lys 



(2) INFORMATION FOR SEQ ID N0:3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1271 base pairs 
30 (B) TYPE: nucleic acid 

iC) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: inRNA. 

SS (iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 



40 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 3: 



UGAAUGGUUA 


UUUUAGCAAU 


UGCUGUGUGA 


GGCACUGACC 


UAAAGAUGUG 


UAUAAAUAGU 


60 


GGGACUGUGU 


ACUCAUGAGG 


AUCAAUACAU 


GUAQAAACUU 


ACCADACUUU 


CACACAAGUC 


120 


AACUUAGAAU 


CAAUCAAUCA 


AUCAAUUAAU 


CAAGCUAUAC 


AAUAUGACAA 


AAGUCUUCGU 


180 


AACAGGUGCC 


AACGGAUUCG 


UUGCUCAACA 


CGUCGUUCAU 


CAACUAUUAG 


AAAAGAACUA 


240 


UACAGUGGUU 


GGAUCUGUCC 


GUUCAACUGA 


GAAAGGUGAU 


AAAUUA6CUA 


AAUUGCUAAA 


300 


CAAUCCAAAA 


UUUUCAUAUG 


AGAUUAUUAA 


AGAUAUGGUC 


AAUUCGAGAG 


AUGAAUUCGA 


360 


UAAGGCUUUA 


CAAAAACAUU 


CAGAUGUUGA 


AAUUGUCUUA 


CAUACUGCUU 


CACCAGUCUU 


420 


CCCAGGUGGU 


AUUAAAGAUG 


UUGAAAAA6A 


AAUGAUCCAA 


CCAGCUGUUA AUGGUACUAG 


4B0 


AAAUGUCUUG 


UUAUCAAUCA 


AGGAUAACUU 


ACCAAAUGUC 


AAGAGAUUUG 


UUUACACUUC 


540 
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10 



IS 



20 



2S 



30 



3S 



40 



45 



OUCAUUAGCU GCUGUCCGUA CUGAAGGUGC UGGUUAUAGU GCAGACX3AAG UUGUCACCGA 600 

AGAUUCUUGG AACAAUAUUG CAUUGAAAGA UGCCACCAAG GAUGAAGGUA CAGCUUAUGA 660 

GGCUUCCAAG ACAUAUGGUG AAAAAGAAGU UUGGAAimUC UUCGAAAAAA CUAAAAAUGU 720 

0AAUUUCX3AU UUUGCX:AUCA UCAACCCAGU UUAU6UCUUU GGUCCUCAAU UAUUUGAAGA 780 

AUACGUUACU OAUAAAUUGA ACUUUUCCAG UQAAAUCAUU AAUAGUAUAA UAAAAGGUGA 840 

AAAGAAGGAA AUUGAAGGUU AUGAAAUUGA UGUUAGAGAU AUUGCAAGAG CUCAUAUCUC 900 

UGCUGUtXSAA AAUCCAGCAA CUACACGUCA AAGAUUAAUU CCAGCAGUUG CACCAUACAA 960 

UCAACAAACU AUCUUGGAUG UUUUGAAUGA AAACUUCCCA GAAUUGAAAG GUAAAAUCGA 1020 

UGUUGGGAAA CCAGGUUCUC AAAAUGAAUU UAUUAAAAAA UAUUAUAAAU UAGAUAACUC 1080 

AAAGACCAAA AAAGOUUUAG GUUUUGAAUU CAUUUCCCAA GAGCAAACAA UCAAAGAUGC 1140 

UGCUGCUCAA AUCUUGUCCG UUAAAAAUGG AAAAAAAUAA GUGAACUAGA CCUGUCACUA 1200 

UCAGAUUAUU AGA6UUCUGU AUAGAUUAAA GUGUGAAAAU GUAUUAGAAU CAUAAUUUUA 1260 

UAAUUAUGCC U 1271 
(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1032 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1032 

(D) OTHER INFORMATION: S.cerevisiae YDR541c 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



ATG TCT AAT ACA GTT CTA GTT TCT GGC GCT TCA G6T TTT ATT GCC TTG 48 
Met Ser Asn Thr Vttl Leu Val Ser Gly Ala Ser Gly Phe He Ala Leu 
15 10 15 

CAT ATC CTG TCA CAA TTG TTA AAA CAA GAT TAT AAG GTT ATT GGA ACT 96 
His He Leu Ser Gin Leu Leu Lys Gin Asp Tyr Lys Val He Gly Thr 
20 25 30 

GTG A6A TCC CAT GAA AAA GAA GCA AAA TTG CTA AGA CAA TTT CAA CAT 144 
SO Val Arg Ser His GIu Lys Glu Ala Lys Leu Leu Arg Gin Phe Gin His 

35 40 45 

AAC OCT AAT TTA ACT TTA GAA ATT GTT CCG GAC ATT TCT CAT CCA AAT 192 
Asn Pro Asn Leu Thr Leu Glu He Val Pro Asp He Ser His Pro Asn 
50 55 60 



55 



GCT TTC GAT AAG GTT CTG CAG AAA CGT GGA CGT GAG ATT AGG TAT GTT 240 
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10 



Ala Phe Asp Lys Val Leu Gin Lys Arg Gly Arg Glu lie Arg Tyr Val 
65 70 75 BO 

CTA CAC ACG GCC TCT CCT TTT CAT TAT GAT ACT ACC GAA TAT GAA AAA 288 
Leu His Thr Ala Ser Pro Phe His Tyr Asp Thr Thr Glu Tyr Glu Lys 
85 90 95 

GAC TTA TTG ATT CCC GOG TTA GAA GGT ACA AAA AAC ATC CTA AAT TCT 336 
Asp Leu Leu He Pro Ala Leu Glu Gly Thr Lys Asn He Leu Asn Ser 
100 105 110 

ATC AAG AAA TAT OCA GCA GAC ACT GTA GAG COT GTT GTT GTG ACT TCT 384 
lie Lys Lys Tyr Ala Ala Asp Thr Val Glu Arg Val Val Val Thr Ser 
115 120 125 

TCT TGT ACT OCT ATT ATA ACC CTT GCA AAG ATG GAC GAT CCC ACT GTG 432 
15 Ser Cys Thr Ala He He Thr Leu Ala Lys Met Asp Asp Pro Ser Val 

130 135 140 

GTT TTT ACA GAA GAG ACT TGG AAC GAA GCA ACC TGG GAA AGC TGT CAA 480 
Val Phe Thr Glu Glu Ser Trp Asn Glu Ala Thr Trp Glu Ser Cys Gin 
145 150 155 160 

20 

ATT GAT GGG ATA AAT GCT. TAG TTT GCA TCC AAG AAG TTT GCT GAA AAG 528 
He Asp Gly He Asn Ala Tyr Phe Ala Ser Lys Lys Phe Ala Glu Lys 
165 170 175 

GCT GCC TGG GAG TTC ACA AAA GAG AAT GAA GAT CAC ATC AAA TTC AAA 576 
2S Ala Ala Trp Glu Phe Thr Lys Glu Asn Glu Asp His He Lys Phe Lys 

180 185 190 

CTA ACA ACA GTC AAC CCT TCT CTT CTT TTT GGT CCT CAA CTT TTC GAT 624 
Leu Thr Thr Val Asn Pro Ser Leu Leu Phe Gly Pro Gin Leu Phe Asp 
195 200 205 

30 

GAA GAT GTG CAT GGC CAT TTG AAT ACT TCT TGC GAA ATG ATC AAT GGC 672 
Glu Asp Val His Gly His Leu Asn Thr Ser Cys Glu Met He Asn Gly 
210 215 • 220 

CTA ATT CAT ACC CCA GTA AAT GCC AGT GTT CCT GAT TTT CAT TCC ATT 720 
35 Leu He His Thr Pro Val Asn Ala Ser Val Pro Asp Phe His Ser He 

225 230 235 240 

TTT ATT GAT GTA AGG GAT GTG GCC CTA GCT CAT CTG TAT GCT TTC CAG 768 
Phe He Asp Val Arg Asp Val Ala Leu Ala His Leu Tyr Ala Phe Gin 
245 250 255 

40 

AAG GAA AAT ACC GCG GGT AAA AGA TTA GTG GTA ACT AAC GGT AAA TTT 816 
Lys Glu Asn Thr Ala Gly Lys Arg Leu Val Val Thr Asn Gly Lys Phe 
260 265 270 

GGA AAC CAA GAT ATC CTG GAT ATT TTG AAC GAA GAT TTT CCA CAA TTA 864 
45 Gly Asn Gin Asp He Leu Asp He Leu Asn Glu Asp Phe Pro Gin Leu 

275 280 285 

AGA GGT CTC ATT CCT TTG GGT AAG CCT GGC ACA GGT GAT CAA GTC ATT 912 
Arg Gly Leu He Pro Leu Gly Lys Pro Gly Thr Gly Asp Gin Val He 
290 295 300 

SO 

GAC OGC GGT TCA ACT ACA GAT AAT AGT GCA ACG AGG AAA ATA CTT GGC 960 
Asp Arg Gly Ser Thr Thr Asp Asn Ser Ala Thr Arg Lys He Leu Gly 
305 310 315 320 

TTT GAG TTC AGA AGT TTA CAC GAA AGT GTC CAT GAT ACT GCT GCC CAA 1008 
55 Phe Glu Phe Arg Ser Leu His Glu Ser Val His Asp Thr Ala Ala Gin 

325 330 335 
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ATT TTG AAG AAG GAG AAC AGA TTA 1032 

He Leu Lys Lys Glu Asn Arg Leu 
340 

5 

(2) INFORMATION FOR SBQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 344 amino acids 
10 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



1$ 



20 



25 



$0 



35 



40 



45 



SO 



SS 



Met Ser Asn Thr Val Leu Val Ser Gly Ala Ser Gly Phe He Ala Leu 

1 5 . 10 15 

His He Leu Ser Gin Leu Leu Lys Gin Asp Tyr Lys Val He Gly Thr 

20 25 30 

Val Arg Ser His Glu Lys Glu Ala Lys Leu Leu Arg Gin Phe Gin His 
35 40 45 

Asn Pro Asn Leu Thr Leu Glu He Val Pro Asp He Ser His Pro Asn 
50 55 60 

Ala Phe Asp Lys Val Leu Gin Lys Arg Gly Arg Glu He Arg Tyr Val 
65 70 75 ^ 80 

Leu His Thr Ala Ser Pro Phe His Tyr Asp Thr Thr Glu Tyr Glu Lys 
85 90 95 

Asp Leu Leu He Pro Ala Leu Glu Gly Thr Lys Asn He Leu Asn Ser 
100 105 110 

He Lys Lys Tyr Ala Ala Asp Thr Val Glu Arg Val Val Val Thr Ser 

115 120 125 

Ser Cys Thr Ala He He Thr Leu Ala Lys Met Asp Asp Pro Ser Val 
130 135 140 

Val Phe Thr Glu Glu Ser Trp Asn Glu Ala Thr Trp Glu Ser Cys Gin 
145 150 155 160 

He Asp Gly He Asn Ala Tyr Phe Ala Ser Lys Lys Phe Ala Glu Lys 
165 170 175 

Ala Ala Trp Glu Phe Thr Lys Glu Asn Glu Asp His He Lys Phe Lys 
180 185 190 

Leu Thr Thr Val Asn Pro Ser Leu Leu Phe Gly Pro Gin Leu Phe Asp 
195 200 205 

Glu Asp Val His Gly His Leu Asn Thr Ser Cys Glu Met He Asn Gly 
210 215 220 • 

Leu He His Thr Pro Val Asn Ala Ser Val Pro Asp Phe His Ser He 
225 230 235 240 

Phe He Asp Val Arg Asp Val Ala Leu Ala His Leu Tyr Ala Phe Gin 
245 250 255 

Lys Glu Asn Thr Ala Gly Lys Arg Leu Val Val Thr Asn Gly Lys Phe 
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10 



IS 



260 265 270 

Gly Asn Gin Asp He Leu Asp He Leu Asn Glu Asp Phe Pro Gin Leu 
275 280 285 

Arg Gly Leu He Pro Leu Gly Lys Pro Gly Thr Gly Asp Gin Val He 
290 295 300 

Asp Arg Gly Ser Thr Thr Asp Asn Ser Ala Thr Arg Lys He Leu Gly 
305 310 315 320 

Phe Glu Phe Arg Ser Leu His Glu Ser Val His Asp Thr Ala Ala Gin 
325 330 335 

He Leu Lys Lys Glu Asn Arg Leu 
340 

(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1032 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: mRNA 
(iii) HYPOTHETICAL: NO 

2S 

(iv) ANTI-SENSE: NO 



30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 





AUGUCUAAUA 


CAGUUCUAGU 


UUCUGGCGCU 


UCAGGUUUUA UUGCCUUGCA 


UAUCCUGUCA 


60 




CAAUUGUUAA 


AACAAGAUUA 


UAAGGUUAUU 


GGAACUGUGA GAUCCCAUGA 


AAAAGAAGCA 


120 


35 


AAAUUGCUAA 


GACAAUUUCA 


ACAUAACCCU 


AAUUUAACUU UAGAAAUUGU 


UCCGGACAUU 


180 




UCUCAUCCAA 


AUGCUUUC6A 


UAAGGUUCUG 


CAGAAACGUG GACGUGAGAU 


UAGGUAUGUU 


240 




CUACACACGG 


CCUCUCCUUU 


UCAUUAUGAU 


ACUACCGAAU AUGAAAAAGA 


CUUAUUGAUU 


300 


40 


CCCGCGUUAG 


AAGGUACAAA 


AAACAUCCUA 


AAUUCUAUCA AGAAAUAUGC 


AGCAGACACU 


360 




GUAGAGCGU6 


UUGUUGUGAC 


UUCUUCUU6U 


ACUGCUAUUA UAACCCUUGC 


AAAGAUGGAC 


420 




GAUCCCAGU6 


UGGUUUUUAC 


AGAAGAGA6U 


UGGAACGAAO CAACCUGG6A 


AAGCUGUCAA 


480 


45 


AX7UGAUGGGA 


UAAAUGCUUA 


CUUUGCAUCC 


AAGAAGUUUG CUGAAAAGGC 


UGCCUGGGAG 


540 




UUCACAAAAG 


AGAAUGAAGA 


UCACAUCAAA 


UUCAAACUAA CAACAGUCAA 


CCCUUCUCUU 


600 




CUUUUUGGUC 


CUCAACUUUU 


CGAUGAAGAU 


GUGCAUQGCC AUUUGAAUAC 


UUCUUGCGAA 


660 


50 


AUGAUCAAUG 


GCCUAAUUCA 


UACCCCAGUA 


AAUGCCAGUG UUCCUGAUUU 


UCAUUCCAUU 


720 




UUUAUUQAUG 


UAAGGGAUGU 


GGCCCUAGCU 


CAUCUGUAUG CUUUCCAGAA 


GGAAAAUACC 


780 




GCGGGUAAAA 


OAUUAGUGGU 


AACUAACGGU 


AAAUUUGGAA ACCAAGAUAU 


CCUGGAUAUU 


840 


SS 


UUGAACGAAG 


AUUUUCCACA 


AUUAAGAGGU 


CUCAUUCCUU UGGGUAAGCC 


UGGCACAGGU 


900 
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GAUCAAGUCA UUGACCGCGG UUCAACUACA GAUAAUAGUG CAACGAGGAA AAUACUUGGC 960 

UUUGAGUUCA GAAGUUUACA CGAAAGUGUC CAUGAUACUG CUGCCCAAAU UUUGAAGAAG 1020 

5 GAGAACAGAU UA 1032 

(2 J INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1029 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDBONESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
IS (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 



(ix) FEATURE: 
20 (A) NAME/ KEY: CDS 

(B) LOCATION: 1..1026 

(D) OTHER INFORMATION: S.cerevisiae Y0L151W 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

2S ATG TCA GTT TTC GTT TCA GGT GCT AAC GGG TTC ATT GCC CAA CAC ATT 48 

Met Ser Val Phe Val Ser Gly Ala Asn Gly Phe He Ala Gin His He 
15 10 15 

GTC GAT CTC CTG TTG AAG GAA GAC TAT AAG GTC ATC GGT TCT GCC AGA 96 
Val Asp Leu Leu Leu Lys Glu Asp Tyr Lys Val He Gly Ser Ala Arg 
30 20 25 30 

AGT CAA GAA AAG GCC GAG AAT TTA ACG GAG GCC TTT GGT AAC AAC CCA 144 
Ser Gin Glu Lys Ala Glu Asn Leu Thr Glu Ala Phe Gly Asn Asn Pro 
35 40 45 

35 AAA TTC TCC ATG GAA GTT GTC CCA GAC ATA TCT AAG CTG GAC GCA TTT 192 

Lys Phe Ser Met Glu Val Val Pro Asp He Ser Lys Leu Asp Ala Phe 
50 55 60 

GAC CAT GTT TTC CAA AAG CAC GGC AAG GAT ATC AAG ATA GTT CTA CAT 240 
Asp His Val Phe Gin Lys His Gly Lys Asp He Lys He Val Leu His 
40 65 70 75 80 

ACG GCC TCT CCA TTC TGC TTT GAT ATC ACT GAC AGT GAA CGC GAT TTA 288 
Thr Ala Ser Pro Phe Cys Phe Asp He Thr Asp Ser Glu Arg Asp Leu 
B5 90 95 

4S TTA ATT CCT GCT GTG AAC GGT GTT AAG GGA ATT CTC CAC TCA ATT AAA 336 

Leu He Pro Ala Val Asn Gly Val Lys Gly He Leu His Ser He Lys 
100 105 110 

AAA TAC GCC GCT GAT TCT GTA GAA CGT GTA GTT CTC ACC TCT TCT TAT 384 
Lys Tyr Ala Ala Asp Ser Val Glu Arg Val Val Leu Thr Ser Ser Tyr 
SO 115 120 125 

GCA GCT GTG TTC GAT ATG GCA AAA GAA AAC GAT AAG TCT TTA ACA TTT 432 
Ala Ala Val Phe Asp Met Ala Lys Glu Asn Asp Lys Ser Leu Thr Phe 
130 135 140 

55 AAC GAA GAA TCC TGG AAC CCA GCT ACC TGG GAG AGT TGC CAA AGT GAC 480 

Asn Glu Glu Ser Trp Asn Pro Ala Thr Trp Glu Ser Cys Gin Ser Asp 
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145 150 155 160 

CCA GTT AAC GCC TAC TGT GGT TCT AAG AAG TTT GCT GAA AAA GCA GCT 
Pro Val Asn Ala Tyr Cys Gly Ser Lys Lys Phe Ala Glu Lys Ala Ala 
165 170 175 

TGG GAA TTT CTA GAG GAG AAT AGA GAC TCT GTA AAA TTC GAA TTA ACT 
Trp Glu Phe Leu Glu Glu Asn Arg Asp Ser Val Lys Phe Glu Leu Thr 
180 185 190 

GCC GTT AAC CCA GTT TAC GTT TTT GGT CCG CAA ATG TTT GAC AAA GAT 
Ala Val Asn Pro Val Tyr Val Phe Gly Pro Oln Met Phe Asp Lys Asp 
195 200 205 

GTG AAA AAA CAC TTG AAC ACA TCT TGC GAA CTC GTC AAC AGC TTG ATG 
Val Lys Lys His Leu Asn Thr Ser Cys Glu Leu Val Asn Ser Leu Met 
210 215 220 

CAT TTA TCA CCA GAG GAC AAG ATA CCG GAA CTA TTT GGT GGA TAC ATT 
His Leu Ser Pro Glu Asp Lys lie Pro Glu Leu Phe Gly Gly Tyr He 
225 230 235 240 

GAT GTT CGT GAT GTT GCA AAG GCT CAT TTA GTT GCC TTC CAA AAG AGG 
Asp Val Arg Asp Val Ala Lys Ala His Leu Val Ala Phe Gin Lys Arg 
245 250 255 

GAA ACA ATT GGT CAA AGA CTA ATC GTA TCG GAG GCC AGA TTT ACT ATG 
Glu Thr He Gly Gin Arg Leu He Val Ser Glu Ala Arg Phe Thr Met 
260 265 270 

CAG GAT GTT CTC GAT ATC CTT AAC GAA GAC TTC CCT GTT CTA AAA GGC 
Gin Asp Val Leu Asp He Leu Asn Glu Asp Phe Pro Val Leu Lys Gly 
275 280 285 

AAT ATT CCA GTG GGG AAA CCA GGT TCT GGT GCT ACC CAT AAC ACC CTT 
Asn He Pro Val Gly Lys Pro Gly Ser Gly Ala Thr His Asn Thr Leu 
290 295 300 

GGT GCT ACT CTT GAT AAT AAA AAG AGT AAG AAA TTG TTA GGT TTC AAG 
Gly Ala Thr Leu Asp Asn Lys Lys Ser Lys Lys Leu Leu Gly Phe Lys 
305 310 315 320 

TTC AGG AAC TTG AAA GAG ACC ATT GAC GAC ACT GCC TCC CAA ATT TTA 
Phe Arg Asn Leu Lys Glu Thr He Asp Asp Thr Ala Ser Gin He Leu 
325 330 335 

AAA TTT GAG GGC AGA ATA TAA 
Lys Phe Glu Gly Arg He 
340 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Ser Val Phe Val Ser Gly Ala Asn Gly Phe He Ala Gin His He 
15 10 15 

Val Asp Leu Leu Leu Lys Glu Asp Tyr Lys Val He Gly Ser Ala Arg 
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20 25 30 

Ser Gin Glu Lys Ala Glu Asn Leu Thr Glu Ala Phe Gly Asn Asn Pro 
35 40 45 

Lys Phe Ser Met Glu Val Val Pro Asp lie Ser Lys Leu Asp Ala Phe 
50 55 60 

Asp His Val Phe Gin Lys His Gly Lys Asp lie Lys lie Val Leu His 
65 70 75 80 

Thr Ala Ser Pro Phe Cys Phe Asp lie Thr Asp Ser Glu Arg Asp Leu 
85 90 95 

Leu He Pro Ala Val Asn Gly Val Lys Gly He Leu His Ser He Lys 
100 105 110 

Lys Tyr Ala Ala Asp Ser Val Glu Arg Val Val Leu Thr Ser Ser Tyr 
115 120 125 

Ala Ala Val Phe Asp Met Ala Lys Glu Asn Asp Lys Ser Leu Thr Phe 
130 135 140 

Asn Glu Glu Ser Trp Asn Pro Ala Thr Trp Glu Ser Cys Gin Ser Asp 
145 150 155 160 

Pro Val Asn Ala Tyr Cys Gly Ser Lys Lys Phe Ala Glu Lys Ala Ala 
165 170 175 

Trp Glu Pho Leu Glu Glu Asn Arg Asp Ser Val Lys Phe Glu Leu Thr 
180 185 190 

Ala Val Asn Pro Val Tyr Val Phe Gly Pro Gin Met Phe Asp Lys Asp 

195 200 205 

Val Lys Lys His Leu Asn Thr Ser Cys Glu Leu Val Asn Ser Leu Met 
210 215 220 

His Leu Ser Pro Glu Asp Lys He Pro Glu Leu Phe Gly Gly Tyr He 
225 230 235 240 

Asp Val Arg Asp Val Ala Lys Ala His Leu Val Ala Phe Gin Lys Arg 
245 250 255 

Glu Thr He Gly Gin Arg Leu He Val Ser Glu Ala Arg Phe Thr Met 
260 265 270 

Gin Asp Val Leu Asp He Leu Asn Glu Asp Phe Pro Val Leu Lys Gly 
275 280 285 

Asn He Pro Val Gly Lys Pro Gly Ser Gly Ala Thr His Asn Thr Leu 
290 295 300 

Gly Ala Thr Leu Asp Asn Lys Lys Ser Lys Lys Leu Leu Gly Phe Lys 
305 310 315 320 

Phe Arg Asn Leu Lys Glu Thr He Asp Asp Thr Ala Ser Gin He Leu 
325 330 335 

Lys Phe Glu Gly Arg He 
340 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1026 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: raRNA 

(iiil HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

10 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



IS 


AUGUCAGUUU 


UCGUUUCAGG 


UGCUAACGGG 


UUCAUUGCCC AACACAUUGU 


CGAUCUCCUG 


60 


UUGAAGGAAG 


ACUAUAAGGU 


CAUCGGUUCU 


GCCAGAAGUC AAGAAAAGGC 


CGAGAAUUUA 


120 




ACGGAGGCCU 


UUGGUAACAA 


CCCAAAAUUC 


UCCAUGGAAG UUGUCCCAGA 


CAUAUCUAAG 


180 


20 


CUGGACGCAU 


UUGACCAUGU 


UUUCCAAAAG 


CACGGCAAGG AUAUCAAGAU 


AGUUCUACAU 


240 




ACGGCCUCUC 


CAUUCUGCUU 


UGAUAUCACU 


GACAGUGAAC GCGAUUUAUU 


AAUUCCUGCU 


300 




GUGAACGGUG 


UUAAGGGAAU 


UCUCCACUCA 


AUUAAAAAAU ACGCCGCUGA 


UUCUGUAGAA 


360 


2S 


CGUGUAGUUC 


UCACCUCUUC 


UUAUGCAGCU 


GUGUUCGAUA UGGCAAAAGA AAACGAUAAG 


420 




UCUUUAACAU 


UUAACGAAGA 
• 


AUCCUGGAAC 


CCAGCUACCU GGGAGAGUUG CCAAAGUGAC 


480 




CCAGUUAACG 


CCUACUGUGG 


UUCUAAGAAG 


UUUGCUGAAA AAGCAGCUUG GGAAUUUCUA 


540 


30 


GAGGAGAAUA 


6AGACUCUGU 


AAAAUUCGAA 


UUAACUGCCG XJUAACCCAGU 


UUACGUUUUU 


600 




GGUCCGCAAA 


UGUUUGACAA 


AGAUGUGAAA 


AAACACUUGA ACACAUCUUG 


CGAACUCGUC 


660 




AACAGCUUGA 


UGCAUUUAUC 


ACCAGAGGAC 


AAGAUACCGG AACUAUUUGG 


UGGAUACAUU 


720 


35 


GAUGUUCGUG 


AU6UUGCAAA 


GGCUCAUUUA 


GXnJGCCUUCC AAAAGAGGGA AACAAUUGGU 


780 




CAAAGACUAA 


UCGUAUCGGA 


GGCCAGAUUU 


ACUAUGCAGG AUGUUCUCGA 


UAUCCUUAAC 


840 




GAAGACUUCC 


CUGUUCUAAA 


AGGCAAUAUU 


CCAGU6GGGA AACCAGGXJUC 


UGGUGCUACC 


900 


40 


CAUAACACCC 


UUGGUGCUAC 


UCUUGAUAAU 


AAAAAGAGUA AGAAAUUGUU 


AGGUUUCAAG 


960 




UUCAGGAACU 


UGAAAGAGAC 


CAUUGACGAC 


ACUGCCUCCC AAAUUUUAAA 


AUUU6AGG6C 


1020 




AGAAUA 










1026 


45 


(2) INFORMATION FOR SEQ ID N0:10: 









(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1041 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1041 

(D) OTHER INFORMATION: S. cerevisiae YGL157W 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

ATG ACT ACT GAT ACC ACT GTT TTC GTT TCT GGC GCA ACC GOT TTC ATT 
Met Thr Thr Asp Thr Thr Val Phe Val Ser Gly Ala Thr Gly Phe He 
15 10 15 

GCT CTA CAC ATT ATG AAC GAT CTG TTG AAA GCT GGC TAT ACA GTC ATC 
Ala Leu His He Met Asn Asp Leu Leu Lys Ala Gly Tyr Thr Val He 
20 25 30 

GGC TCA GGT AGA TCT CAA GAA AAA AAT GAT GGC TTG CTC AAA AAA TTT 
Gly Ser Gly Arg Ser Gin Glu Lys Asn Asp Gly Leu Leu Lys Lys Phe 
35 40 45 

AAT AAC AAT CCC AAA CTA TCG ATG GAA ATT GTG GAA GAT ATT GCT GCT 
Asn Asn Asn Pro Lys Leu Ser Met Glu He Val Glu Asp He Ala Ala 
50 55 60 

CCA AAC GCC TTT GAT GAA GTT TTC AAA AAA CAT GGT AAG GAA ATT AAG 
Pro Asn Ala Phe Asp Glu Val Phe Lys Lys His Gly Lys Glu He Lys 
65 70 75 80 

ATT GTG CTA CAC ACT GCC TCC CCA TTC CAT TTT GAA ACT ACC AAT TTT 
He Val Leu His Thr Ala Ser Pro Phe His Phe Glu Thr Thr Asn Phe 
85 90 95 

GAA AAG GAT TTA CTA ACC CCT GCA GTG AAC GGT ACA AAA TCT ATC TTG 
Glu Lys Asp Leu Leu Thr Pro Ala Val Asn Gly Thr Lys Ser He Leu 
100 105 110 

GAA GCG ATT AAA AAA TAT GCT GCA GAC ACT GTT GAA AAA GTT ATT GTT 
Glu Ala He Lys Lys Tyr Ala Ala Asp Thr Val Glu Lys Val He Val 
115 120 125 

ACT TCG TCT ACT GCT GCT CTG GTG ACA CCT ACA GAC ATG AAC AAA GGA 
Thr Ser Ser Thr Ala Ala Leu Val Thr Pro Thr Asp Met Asn Lys Gly 
130 135 140 

GAT TTG GTG ATC ACG GAG GAG AGT TGG AAT AAG GAT ACA TGG GAC AGT 
Asp Leu Val He Thr Glu Glu Ser Trp Asn Lys Asp Thr Trp Asp Ser 
145 150 155 160 

TGT CAA GCC AAC GCC GTT GCC GCA TAT TGT GGC TCG AAA AAG TTT GCT 
Cys Gin Ala Asn Ala Val Ala Ala Tyr Cys Gly Ser Lys Lys Phe Ala 

165 170 175 

GAA AAA ACT GCT TGG GAA TTT CTT AAA GAA AAC AAG TCT AGT GTC AAA 
Glu Lys Thr Ala Trp Glu Phe Leu Lys Glu Asn Lys Ser Ser Val Lys 
180 185 190 

TTC ACA CTA TCC ACT ATC AAT CCG GGA TTC GTT TTT GGT CCT CAA ATG 
Phe Thr Leu Ser Thr He Asn Pro Gly Phe Val Phe Gly Pro Gin Met 
195 200 205 

TTT GCA GAT TCG CTA AAA CAT GGC ATA AAT ACC TCC TCA GGG ATC GTA 
Phe Ala Asp Ser Leu Lys His Gly He Asn Thr Ser Ser Gly He Val 
210 215 220 

TCT GAG TTA ATT CAT TCC AAG GTA GGT GGA GAA TTT TAT AAT TAG TGT 
Ser Glu Leu He His Ser Lys Val Gly Gly Glu Phe Tyr Asn Tyr Cys 
225 230 235 240 
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GGC CCA TTT ATT GAC GTG CGT GAC GTT TCT AAA GCC CAC CTA GTT GCA 
Gly Pro Phe lie Asp Val Arg Asp Val Ser Lys Ala His Leu Val Ala 
245 250 255 

ATT GAA AAA CCA GAA TGT ACC GGC CAA AGA TTA GTA TTG ACT GAA GGT 
lie Glu Lys Pro Glu Cys Thr Gly Gin Arg Leu Val Leu Ser Glu Gly 
260 265 270 

TTA TTC TGC TGT CAA GAA ATC GTT GAC ATC TTG AAC GAG GAA TTC CCT 
Leu Phe Cys Cys Gin Glu lie Val Asp He Leu Asn Glu Glu Phe Pro 
275 280 285 

CAA TTA AAG GGC AAG ATA GCT ACA GGT GAA CCT GCG ACC GGT CCA AGC 
Gin Leu Lys Gly Lys He Ala Thr Gly Glu Pro Ala Thr Gly Pro Ser 
290 295 300 

TTT TTA GAA AAA AAC TCT TGC AAG TTT GAC AAT TCT AAG ACA AAA AAA 
Phe Leu Glu Lys Asn Ser Cys Lys Phe Asp Asn Ser Lys Thr Lys Lys 
305 310 315 320 

CTA CTG GGA TTC CAG TTT TAC AAT TTA AAG GAT TGC ATA GTT GAC ACC 
Leu Leu Gly Phe Gin Phe Tyr Asn Leu Lys Asp Cys He Val Asp Thr 
325 330 335 

GCG GCG CAA AT6 TTA GAA GTT CAA AAT GAA GCC 
Ala Ala Gin Met Leu Glu Val Gin Asn Glu Ala 
340 345 



(2) 1NF0RMATI(»I FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 347 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Thr Thr Asp Thr Thr Val Phe Val Ser Gly Ala Thr Gly Phe He 
1 5 10 15 

Ala Leu His He Met Asn Asp Leu Leu Lys Ala Gly Tyr Thr val He 

20 25 30 

Gly Ser Gly Arg Ser Gin Glu Lys Asn Acp Gly Leu Leu Lys Lys Phe 
35 40 45 

Asn Asn Asn Pro Lys Leu Ser Met Glu He Val Glu Asp He Ala Ala 

50 55 60 

Pro Asn Ala Phe Asp Glu Val Phe Lys Lys His Gly Lys Glu He Lys 
65 70 75 80 

He Val Leu His Thr Ala Ser Pro Phe His Phe Glu Thr Thr Asn Phe 
85. 90 95 

Glu Lys Asp Leu Leu Thr Pro Ala Val Asn Gly Thr Lys Ser He Leu 
100 105 . 110 

Glu Ala He Lys Lys Tyr Ala Ala Asp Thr Val Glu Lys Val He Val 
115 120 125 

Thr Ser Ser Thr Ala Ala Leu Val Thr Pro Thr Asp Met Asn Lys Gly 
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10 



IS 



20 



2S 



30 



3S 



130 135 140 

Asp Leu Val lie Thr Glu Glu Ser Trp Asn Lys Asp Thr Trp Asp Ser 
145 150 155 160 

Cys Gin Ala Asn Ala Val Ala Ala Tyr Cys Gly Ser Lys Lys Phe Ala 
165 170 175 

Glu Lys Thr Ala Trp Glu Phe Leu Lys Glu Asn Lys Ser Ser Val Lys 
180 185 190 

Phe Thr Leu Ser Thr He Asn Pro Gly Phe Val Phe Gly Pro Gin Met 
195 200 205 

Phe Ala Asp Ser Leu Lys His Gly He Asn Thr Ser Ser Gly lie Val 
210 215 220 

Ser Glu Leu He His Ser Lys Val Gly Gly Glu Phe Tyr Asn Tyr Cys 
225 230 235 240 

Gly Pro Phe . He Asp Val Arg Asp Val Ser Lys Ala His Leu Val Ala 
245 250 255 

He Glu Lys Pro Glu Cys Thr Gly Gin Arg Leu Val Leu Ser Glu Gly 
260 265 270 

Leu Phe Cys Cys Gin Glu He Val Asp He Leu Asn Glu Glu Phe Pro 
275 280 285 

Gin Leu Lys Gly Lys He Ala Thr Gly Glu Pro Ala Thr Gly Pro Ser 
290 295 300 

Phe Leu Glu Lys Asn Ser Cys Lys Phe Asp Asn Ser Lys Thr Lys Lys 

305 310 315 320 

Leu Leu Gly Phe Gin Phe Tyr Asn Leu Lys Asp Cys He Val Asp Thr 
325 330 335 

Ala Ala Gin Net Leu Glu Val Gin Asn Glu Ala 
340 345 

(2) INFORMATION FOR SEQ ID NO: 12: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1041 base pairs 

(B) TYPE: nucleic acid 
40 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: lllRNA 
(iii) HYPOTHETICAL: NO 

45 

(iv) ANTI-SENSE: NO 



so <xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AUGACUACUG AUACCACUGU UUUCGUUUCU GGCGCAACC6 GUUDCAUUGC UCUACACAUU 60 

AUGAACGAUC UGUUGAAAGC UGGCUAUACA GUCAUCQGCU CAGGUAGAUC UCAAGAAAAA 120 

55 AAUGAUGGCU UGCUCAAAAA AUUUAAUAAC AAUCCCAAAC UAUCGAUGGA AAUUGUGGAA 180 
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rnccAAArfiT 


PTIITTTf^i^TT/m A 


r^TTTTTTfl^X li 1^ & 
U U U UCAAAA 


AACA Uv3<jU AA GGAAAUUAAv? 




AC Acnnrrr IP 






CCAAUUUUviA AAAGGAUUUA 


CUAACCCCUG 


CAGUGAACGG 


UAPAAAJITTPTT 


2V TIPT TTTPP A a P 


PPMTTlAAX^h. if TlkTT/V^TTOO ?k 

i^uAU UAAAAA AU AUuCuGCA 




AAAAAniJITAU 


TlfilTT la PTTTTP/2 


UCUACUGCUG 


CUCUGGUGAC ACCUACAGAC 








GAGA6UUGGA 


AUAAGGAUAC AUGGGACAGU 






Pf^P Zi ft ht IT T^^T f 


GGCUCGAAAA 


AGUUUGCUGA AAAAACUGCU 






0 s A F ¥/^r I ^ f^f f 


GUCAAAUUCA CACUAUCCAC UAUCAAUCXX3 


/2n 11 1 If IflHT n m 


I IT IRHT IPPTTr*a 


AATTflTTIir TP/^ A 


GAUUCGCUAA AACAUGGCAU AAAUACCUCC 






AAU AU UiA. 


AAG0DAG6U6 


GAGAAtTUUUA UAAUUACUGU 


V3i3VoL>(JAU U U A 




TTT^ X /V^ t IT I¥T/^ f 
UUiUJb U U UC U 


AAAGCCCACC 


UA6UUQCAAU UGAAAAACCA 


(9 AAUb UAwWo 




AGUAUUGA6U 


GAAGGUUUAU 


UCUGCUGUCA AGAAAUCGUU 


GACAUCUUGA 


ACGAGGAAUU 


CCCUCAAUUA 


AAGGGCAAGA UAGCUACAGG UGAACCUGCG 


ACCGGUCXrAA 


GCUUUUUAGA 


AAAAAACUCU 


UGCAA6UUUG 


ACAAUUCUAA GACAAAAAAA 


CUACUGGOAU 


UCCAGUUUUA 


CAAUUUAAAG 


GADUGCAUAG 


UOGACACCGC GGCGCAAAUG 


UUAGAA6UUC 


AAAAUGAAGC 


C 







(2) IZ9F0RMATI0N FOR SEQ ZD NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1044 base pairs 
IB) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .1044 

(D) OTHER D^ORHATION: S. cerevisiae YGL039W 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ATG ACT ACT GAA AAA ACC GTT GTT TTT GTT TCT GGT GCT ACT GOT TTC 
Met Thr Thr Glu Lys Thr Val Val Phe Val Ser Gly Ala Thr Gly Phe 
1.5 10 15 

ATT GCT CTA CAC GTA GTG GAG GAT TTA TTA AAA ACT GGT TAG AAG GTC 
lie Ala Leu His Val Val Asp Asp Leu Leu Lys Thr Gly Tyr Lys Val 
20 25 30 

ATC GGT TCG GGT AGG TCC CAA GAA AAG AAT GAT GGA TTG CTG AAA AAA 
lie Gly Ser Gly Arg Ser Gin Glu Lys Asn Asp Gly Leu Leu Lys Lys 
35 40 45 

TTT AAG AGC AAT CCC AAC CTT TCA ATG GAG ATT GTC GAA GAC ATT GCT 
Phe Lys Ser Asn Pro Asn Leu Ser Met Glu lie Val Glu Asp lie Ala 
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50 55 60 

GCT CCA AAC GCT TTT GAC AAA GTT TTT CAA AAG CAC GGC AAA GAG ATC 
Ala Pro Asn Ala Phe Asp Lys Val Phe Gin Lys His Gly Lys Glu lie 
65 70 75 80 

AAG GTT GTC TTG CAC ATA GCT TOT COG GTT CAC TTC AAC ACC ACT GAT 
Lys Val Val Leu His He Ala Ser Pro Val His Phe Asn Thr Thr Asp 
85 90 95 

TTC GAA AAG GAT CTG CTA ATT CCT GCT GTG AAT GGT ACC AAG TCC ATT 
Phe Glu Lys Asp Leu Leu He Pro Ala Val Asn Gly Thr Lys Ser He 
100 105 110 

CTA GAA GCA ATC AAA AAT TAT GCC GCA GAC ACA GTC GAA AAA GTC GTT 
Leu Glu Ala He Lys Asn Tyr Ala Ala Asp Thr Val Glu Lys Val Val 
115 120 125 

ATT ACT TCT TCT GTT GCT GCC CTT GCA TCT CCC GGA GAT ATC AAG GAC 
He Thr Ser Ser Val Ala Ala Leu Ala Ser Pro Gly Asp Met Lys Asp 
130 135 140 

ACT AGT TTC GTT GTC AAT GAG GAA AGT TGG AAC AAA GAT ACT TGG GAA 
Thr Ser Phe Val Val Asn Glu Glu Ser Trp Asn Lys Asp Thr Trp Glu 
145 150 155 160 

AGT TOT CAA GCT AAC GCG GTT TCC GCA TAC TGT GGT TCC AAG AAA TTT 
Ser Cys Gin Ala Asn Ala Val Ser Ala Tyr Cys Gly Ser Lys Lys Phe 
165 170 175 

GCT GAA AAA ACT GCT TGG GAT TTT CTC GAG GAA AAC CAA TCA AGC ATC 
Ala Glu Lys Thr Ala Trp Asp Phe Leu Glu Glu Asn Gin Ser Ser He 
180 185 190 

AAA TTT ACG CTA TCA ACC ATC AAC CCA GGA TTT GTT TTT GGC CCT CAG 
Lys Phe Thr Leu Ser Thr He Asn Pro Gly Phe Val Phe Gly Pro Gin 
195 200 205 

CTA TTT GCC GAC TCT CTT AGA AAT GGA ATA AAT AGC TCT TCA GCC ATT 
Leu Phe Ala Asp Ser Leu Arg Asn Gly He Asn Ser Ser Ser Ala He 
210 215 220 

ATT GCC AAT TTG GTT AGT TAT AAA TTA GGC GAC AAT TTT TAT AAT TAC 
He Ala Asn Leu Val Ser Tyx Lys Leu Gly Asp Asn Phe Tyr Asn Tyr 
225 230 235 240 

AGT GGT CCT TTT ATT GAC GTT CGC GAT GTT TCA AAA GCT CAT TTA CTT 
Ser Gly Pro Phe He Asp Val Arg Asp Val Ser Lys Ala His Leu Leu 
245 250 255 

GCA TTT GAG AAA CCC GAA TGC GCT GGC CAA AGA CTA TTC TTA TGT GAA 
Ala Phe Glu Lys Pro Glu Cys Ala Gly Gin Arg Leu Phe Leu Cys Glu 
260 265 270 

GAT ATG TTT TGC TCT CAA GAA GCG CTG GAT ATC TTG AAT GAG GAA TTT 
Asp Met Phe Cys Ser Gin Glu Ala Leu Asp He Leu Asn Glu Glu Phe 
275 280 285 

CCA CAG TTA AAA GGC AAG ATA GCA ACT GGC GAA CCT GGT AGC GGC TCA 
Pro Gin Leu Lys Gly Lys He Ala Thr Gly Glu Pro Gly Ser Gly Ser 
290 295 300 

ACC TTT TTG ACA AAA AAC TGC TGC AAG TGC GAC AAC CGC AAA ACC AAA 
Thr Phe Leu Thr Lys Asn Cys Cys Lys Cys Asp Asn Arg Lys Thr Lys 
305 310 315 320 
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AAT TTA TTA GGA TTC CAA TTT AAT AAG TTC AGA GAT TGC ATT GTC GAT 1008 
Asn Leu Leu Gly Phe Gin Phe Asn Lys Phe Arg Asp Cys lie Val Asp 
325 330 335 

ACT GCX: TCG CAA TTA CTA GAA GTT CAA AGT AAA AGC 1044 
Thr Ala Ser Gin Leu Leu Glu Val Gin Ser Lys Ser 
340 345 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 348 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

1$ (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Thr Thr Glu Lys Thr Val Val Phe Val Ser Gly Ala Thr Gly Phe 
1 5 -10 15 

lie Ala Leu His Val Val Asp Asp Leu Leu Lys Thr Gly Tyr Lys Val 
20 25 30 

lie Gly Ser Gly Arg Ser Gin Glu Lys Asn Asp Gly Leu Leu Lys Lys 
35 40 45 

Phe Lys Ser Asn Pro Asn Leu Ser Met Glu lie Val Glu Asp lie Ala 
50 55 60 

Ala Pro Asn Ala Phe Asp Lys Val Phe Gin Lys His Gly Lys Glu He 
65 70 75 80 

Lys Val Val Leu His He Ala Ser Pro Val His Phe Asn Thr Thr Asp 
85 90 95 

Phe Glu Lys Asp Leu Leu He Pro Ala Val Asn Gly Thr Lys Ser He 
100 105 110 

Leu Glu Ala He Lys Asn Tyr Ala Ala Asp Thr Val Glu Lys Val Val 
115 120 125 

He Thr Ser Ser Val Ala Ala Leu Ala Ser Pro Gly Asp Met Lys Asp 
130 135 140 

Thr Ser Phe Val Val Asn Glu Glu Ser Trp Asn Lys Asp Thr Trp Glu 
145 150 155 160 

Ser Cys Gin Ala Asn Ala Val Ser Ala Tyr Cys Gly Ser Lys Lys Phe 

165 170 175 

Ala Glu Lys Thr Ala Trp Asp Phe Leu Glu Glu Asn Gin Ser Ser He 
IBO 185 190 

Lys Phe Thr Leu Ser Thr He Asn Pro Gly Phe Val Phe Gly Pro Gin 
195 200 205 

Leu Phe Ala Asp Ser Lou Arg Asn Gly He Asn Ser Ser Ser Ala He 
210 215 220 

He Ala Asn Leu Val Ser Tyr Lys Leu Gly Asp Asn Phe Tyr Asn Tyr 
225 230 235 240 

Ser Gly Pro Phe He Asp Val Arg Asp Val Ser Lys Ala His Leu Leu 



20 



25 



30 



40 



45 



SO 



55 
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245 250 255 

Ala Phe Glu Lys Pro Glu Cys Ala Gly Gin Arg Leu Phe Leu Cys Glu 
260 265 270 

Asp Met Phe Cys Ser Gin Glu Ala Leu Asp lie Leu Asn Glu Glu Phe 
275 280 285 

Pro Gin Leu Lys Gly Lys lie Ala Thr Gly Glu Pro Gly Ser Gly Ser 
290 295 300 

Thr Phe Leu Thr Lys Asn Cys Cys Lys Cys Asp Asn Arg Lys Thr Lys 
305 310 315 320 

Asn Leu Leu Gly Phe Gin Phe Asn Lys Phe Arg Asp Cys lie Val Asp 
325 330 335 

Thr Ala Ser Gin Leu Leu Glu Val Gin Ser Lys Ser 
340 345 

(2) INFORMATION FOR SSQ ID NO: 15: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1044 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

2S (ii) MOLECULE TYPE: xnRNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



10 



IS 



30 



3S 



40 



45 



60 



SS 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

AUGACUACUG AAAAAACCGU UGUUUUUGUU UCUGGUGCUA CU6GUUUCAU UGCUCUACAC 60 

GUA6UGGACG AUUUAUUAAA AACUGGUUAC AAGGUCAUCG GUUCGGGUAG GUCCCAAGAA 120 

AAGAAUGAUG GAUUGCUGAA AAAAUUUAAG AGCAAUCCCA ACCUUUCAAU GGAGAUUGUC 180 

GAAGACAUUG CUGCUCCAAA CGCUUUUGAC AAAGUUUUUC AAAAGCACGG CAAAGAGAUC 240 

AAGGUUGUCU UGCACAUAGC UUCUCCGGUU CACUUCAACA CCACUGAUUU CGAAAAGGAU 300 

CUGCUAAUUC CUGCUGUGAA UGGUACCAAG UCCAUUCUAG AAGCAAUCAA AAAUUAUGCC 360 

GCAGACACAG UCGAAAAAGU CGUUAUUACU UCUUCUGUUG CUGCCCUUGC AUCUCCCGGA 420 

GAUAUGAAGG ACACUAGUUU CGUUGUCAAU GAGGAAAGUU GGAACAAAGA UACUUGGGAA 480 

AGUUGUCAAG CUAACGCGGU UUCCGCAUAC UGUGGUUCCA AGAAAUUUGC UGAAAAAACU 540 

GCUUGGGAUU UUCUCGAGGA AAACCAAUCA AGCAUCAAAU UUACGCUAUC AACCAUCAAC 600 

CCAGGAUUUG UUUUUGGCCC UCAGCUAUUU GCCGACUCUC UUAGAAAUQG AAUAAAUAGC 660 

UCUUCAGCCA UUAUUGCCAA UUUGGUUAGU UAUAAAUUA6 GCGACAAUUU UUAUAAUUAC 720 

AGUGGUCCUU UUAUUGACGU UCGCGAU6UU UCAAAAGCUC AUUUACOUGC AUUUGAGAAA 780 

CCCGAAUGCG CUGGCCAAAG ACUAUUCUUA UGUGAAGAUA UGUUUUGCUC UCAAGAAGCG 840 
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CUGGAUAUCU UGAAUGAGGA AUUUCCACAG UUAAAAGGCA AGAUAGCAAC UGGCGAACCU 900 

GGUAGCGGCU CAACCUUUUU GACAAAAAAC UGCUGCAAGU GCGACAACCG CAAAACCAAA 960 

5 

AAUUUAUUAG GAUUCCAAUU UAAUAAGUUC AGAGAUUGCA XJUGUCGAUAC UGCCUCX3CAA 1020 

UUACUAGAAG UUCAAAGUAA AAGC 1044 



10 

Claims 

I. A substantially pure ketoreductase protein having the amino acid sequence which is SEQ ID NO:2. 

'5 2. An Isolated nucleic acid compound encoding the protein of Claim 1 , said protein having the amino acid sequence 
which is SEQIDN0:2. 

3. An isolated nucleic acid compound encoding the protein of Claim 1. wherein said compound has a sequence 
selected from the group consisting of: 

20 

(a) SEQIDNO:1;or 

(b) SEQ ID NO:a 

4. An isolated nucleic acid compound of Claim 3 wherein the sequence of said compound is SEQ ID NO:1 

2S 

5. An isolated nucleic acid compound having a sequence complementary to SEQ ID NO:1 . 

6. An isolated nucleic acid compound of Claim 3 wherein the sequence of said compound is SEQ ID NO:3. 
30 7. An isolated nucleic acid compound having a sequence complementary to SEQ ID NO:3. 

8. A vector comprising an isolated nucleic acid compound of Claim 2. 

9. A vector comprising an Isolated nuclelc.acid compound of Claim 3. 

3S 

10. A vector of Claim 9. wherein said isolated nucleic acid compound is SEQ ID NO:1 operably-linked to a promoter 
sequence. 

II . A host cell containing the vector of Claim 1 0. 

40 

12. A method for constructing a recombinant host cell having the potential to express SEQ ID NO:2, said method 
comprising Introducing into said host cell by any suitable means a vector of Claim 9. 

13. A method for expressing SEQ ID NO:2 in the recombinant host cell of Claim 12, said method comprising cutturing 
^ said recombinant host cell under conditions suitable for gene expresslori. 

14. A method for reducing a ketone in a stereospeclfic manner comprising providing a quantity of a suitable ketone to 
a culture of recombinant cells for a suitable period of time, wherein said cells are transformed with a vector that 
carries a ketoreductase gene, and wherein said cells express said ketoreductase gene. 

so 

15. A method, as in claim 14 wherein said gene Is selected from the group consisting of SEQ ID NO:1. SEQ ID NO: 
4. SEQ ID N0:7. SEQ ID NO:10, and SEQ ID NO:13. 

16. A method, as in claim 14 wherein said ketone comprises an a-ketolactone, a-ketolactam. or a diketone. 

ss 

17. A method, as In Claim 1 4. wherein said recombinant cells are selected from the group consisting of S, CBrevisiae, 
Z. rouxii, and E. coli. 
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18. A method for reducing a ketone in a stereospecific manner comprising mixing a quantity of a suitable ketone with 
a substantially purified ketoreductase and suitable reducing agent. 

19. A method, as in Claim 1 8 wherein said ketoreductase is selected from the group consisting of SEQ ID NO:2. SEQ 
5 ID N0:5. SEQ ID NO:8, SEQ ID NO:11 , and SEQ ID NO: 14. 

20. An isolated nucleic acid compound that encodes a protein having ketoreductase activity wherein said nucleic acid 
hybridizes under high stringency conditions to SEQ ID NO:1 , SEQ ID N0:4. SEQ ID NO:7, SEQ ID NO: 10. or SEQ 
IDNO:13. 

10 

21. A method, as in Claim 18 wherein said reducing agent is NADPH. 
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